{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Neural Networks\n",
    "===============\n",
    "\n",
    "Neural networks can be constructed using the ``torch.nn`` package.\n",
    "\n",
    "Now that you had a glimpse of ``autograd``, ``nn`` depends on\n",
    "``autograd`` to define models and differentiate them.\n",
    "An ``nn.Module`` contains layers, and a method ``forward(input)``\\ that\n",
    "returns the ``output``.\n",
    "\n",
    "For example, look at this network that classifies digit images:\n",
    "\n",
    ".. figure:: /_static/img/mnist.png\n",
    "   :alt: convnet\n",
    "\n",
    "   convnet\n",
    "\n",
    "It is a simple feed-forward network. It takes the input, feeds it\n",
    "through several layers one after the other, and then finally gives the\n",
    "output.\n",
    "\n",
    "A typical training procedure for a neural network is as follows:\n",
    "\n",
    "- Define the neural network that has some learnable parameters (or\n",
    "  weights)\n",
    "- Iterate over a dataset of inputs\n",
    "- Process input through the network\n",
    "- Compute the loss (how far is the output from being correct)\n",
    "- Propagate gradients back into the network’s parameters\n",
    "- Update the weights of the network, typically using a simple update rule:\n",
    "  ``weight = weight - learning_rate * gradient``\n",
    "\n",
    "Define the network\n",
    "------------------\n",
    "\n",
    "Let’s define this network:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Net(\n",
      "  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
      "  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
      "  (fc1): Linear(in_features=400, out_features=120, bias=True)\n",
      "  (fc2): Linear(in_features=120, out_features=84, bias=True)\n",
      "  (fc3): Linear(in_features=84, out_features=10, bias=True)\n",
      ")\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "\n",
    "class Net(nn.Module):\n",
    "\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        # 1 input image channel, 6 output channels, 5x5 square convolution\n",
    "        # kernel\n",
    "        self.conv1 = nn.Conv2d(1, 6, 5)\n",
    "        self.conv2 = nn.Conv2d(6, 16, 5)\n",
    "        # an affine operation: y = Wx + b\n",
    "        self.fc1 = nn.Linear(16 * 5 * 5, 120)\n",
    "        self.fc2 = nn.Linear(120, 84)\n",
    "        self.fc3 = nn.Linear(84, 10)\n",
    "\n",
    "    def forward(self, x):\n",
    "        # Max pooling over a (2, 2) window\n",
    "        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))\n",
    "        # If the size is a square you can only specify a single number\n",
    "        x = F.max_pool2d(F.relu(self.conv2(x)), 2)\n",
    "        x = x.view(-1, self.num_flat_features(x))\n",
    "        x = F.relu(self.fc1(x))\n",
    "        x = F.relu(self.fc2(x))\n",
    "        x = self.fc3(x)\n",
    "        return x\n",
    "\n",
    "    def num_flat_features(self, x):\n",
    "        size = x.size()[1:]  # all dimensions except the batch dimension\n",
    "        num_features = 1\n",
    "        for s in size:\n",
    "            num_features *= s\n",
    "        return num_features\n",
    "\n",
    "\n",
    "net = Net()\n",
    "print(net)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You just have to define the ``forward`` function, and the ``backward``\n",
    "function (where gradients are computed) is automatically defined for you\n",
    "using ``autograd``.\n",
    "You can use any of the Tensor operations in the ``forward`` function.\n",
    "\n",
    "The learnable parameters of a model are returned by ``net.parameters()``\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10\n",
      "torch.Size([6, 1, 5, 5])\n"
     ]
    }
   ],
   "source": [
    "params = list(net.parameters())\n",
    "print(len(params))\n",
    "print(params[0].size())  # conv1's .weight"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let try a random 32x32 input\n",
    "Note: Expected input size to this net(LeNet) is 32x32. To use this net on\n",
    "MNIST dataset, please resize the images from the dataset to 32x32.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor([[ 0.0845, -0.0100,  0.1020, -0.0397,  0.0485,  0.0917, -0.0860,\n",
      "          0.0980,  0.0847, -0.0202]])\n"
     ]
    }
   ],
   "source": [
    "input = torch.randn(1, 1, 32, 32)\n",
    "out = net(input)\n",
    "print(out)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Zero the gradient buffers of all parameters and backprops with random\n",
    "gradients:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "net.zero_grad()\n",
    "out.backward(torch.randn(1, 10))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div class=\"alert alert-info\"><h4>Note</h4><p>``torch.nn`` only supports mini-batches. The entire ``torch.nn``\n",
    "    package only supports inputs that are a mini-batch of samples, and not\n",
    "    a single sample.\n",
    "\n",
    "    For example, ``nn.Conv2d`` will take in a 4D Tensor of\n",
    "    ``nSamples x nChannels x Height x Width``.\n",
    "\n",
    "    If you have a single sample, just use ``input.unsqueeze(0)`` to add\n",
    "    a fake batch dimension.</p></div>\n",
    "\n",
    "Before proceeding further, let's recap all the classes you’ve seen so far.\n",
    "\n",
    "**Recap:**\n",
    "  -  ``torch.Tensor`` - A *multi-dimensional array* with support for autograd\n",
    "     operations like ``backward()``. Also *holds the gradient* w.r.t. the\n",
    "     tensor.\n",
    "  -  ``nn.Module`` - Neural network module. *Convenient way of\n",
    "     encapsulating parameters*, with helpers for moving them to GPU,\n",
    "     exporting, loading, etc.\n",
    "  -  ``nn.Parameter`` - A kind of Tensor, that is *automatically\n",
    "     registered as a parameter when assigned as an attribute to a*\n",
    "     ``Module``.\n",
    "  -  ``autograd.Function`` - Implements *forward and backward definitions\n",
    "     of an autograd operation*. Every ``Tensor`` operation, creates at\n",
    "     least a single ``Function`` node, that connects to functions that\n",
    "     created a ``Tensor`` and *encodes its history*.\n",
    "\n",
    "**At this point, we covered:**\n",
    "  -  Defining a neural network\n",
    "  -  Processing inputs and calling backward\n",
    "\n",
    "**Still Left:**\n",
    "  -  Computing the loss\n",
    "  -  Updating the weights of the network\n",
    "\n",
    "Loss Function\n",
    "-------------\n",
    "A loss function takes the (output, target) pair of inputs, and computes a\n",
    "value that estimates how far away the output is from the target.\n",
    "\n",
    "There are several different\n",
    "`loss functions <http://pytorch.org/docs/nn.html#loss-functions>`_ under the\n",
    "nn package .\n",
    "A simple loss is: ``nn.MSELoss`` which computes the mean-squared error\n",
    "between the input and the target.\n",
    "\n",
    "For example:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tensor(38.1561)\n"
     ]
    }
   ],
   "source": [
    "output = net(input)\n",
    "target = torch.arange(1, 11)  # a dummy target, for example\n",
    "target = target.view(1, -1)  # make it the same shape as output\n",
    "criterion = nn.MSELoss()\n",
    "\n",
    "loss = criterion(output, target)\n",
    "print(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, if you follow ``loss`` in the backward direction, using its\n",
    "``.grad_fn`` attribute, you will see a graph of computations that looks\n",
    "like this:\n",
    "\n",
    "::\n",
    "\n",
    "    input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d\n",
    "          -> view -> linear -> relu -> linear -> relu -> linear\n",
    "          -> MSELoss\n",
    "          -> loss\n",
    "\n",
    "So, when we call ``loss.backward()``, the whole graph is differentiated\n",
    "w.r.t. the loss, and all Tensors in the graph that has ``requres_grad=True``\n",
    "will have their ``.grad`` Tensor accumulated with the gradient.\n",
    "\n",
    "For illustration, let us follow a few steps backward:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<MseLossBackward object at 0x7f9e3b83f470>\n",
      "<AddmmBackward object at 0x7f9e3b83f588>\n",
      "<ExpandBackward object at 0x7f9e3b83f470>\n"
     ]
    }
   ],
   "source": [
    "print(loss.grad_fn)  # MSELoss\n",
    "print(loss.grad_fn.next_functions[0][0])  # Linear\n",
    "print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Backprop\n",
    "--------\n",
    "To backpropagate the error all we have to do is to ``loss.backward()``.\n",
    "You need to clear the existing gradients though, else gradients will be\n",
    "accumulated to existing gradients.\n",
    "\n",
    "\n",
    "Now we shall call ``loss.backward()``, and have a look at conv1's bias\n",
    "gradients before and after the backward.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "conv1.bias.grad before backward\n",
      "tensor([ 0.,  0.,  0.,  0.,  0.,  0.])\n",
      "conv1.bias.grad after backward\n",
      "tensor([ 0.0380,  0.0051,  0.1086, -0.0498, -0.0551,  0.0115])\n"
     ]
    }
   ],
   "source": [
    "net.zero_grad()     # zeroes the gradient buffers of all parameters\n",
    "\n",
    "print('conv1.bias.grad before backward')\n",
    "print(net.conv1.bias.grad)\n",
    "\n",
    "loss.backward()\n",
    "\n",
    "print('conv1.bias.grad after backward')\n",
    "print(net.conv1.bias.grad)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we have seen how to use loss functions.\n",
    "\n",
    "**Read Later:**\n",
    "\n",
    "  The neural network package contains various modules and loss functions\n",
    "  that form the building blocks of deep neural networks. A full list with\n",
    "  documentation is `here <http://pytorch.org/docs/nn>`_.\n",
    "\n",
    "**The only thing left to learn is:**\n",
    "\n",
    "  - Updating the weights of the network\n",
    "\n",
    "Update the weights\n",
    "------------------\n",
    "The simplest update rule used in practice is the Stochastic Gradient\n",
    "Descent (SGD):\n",
    "\n",
    "     ``weight = weight - learning_rate * gradient``\n",
    "\n",
    "We can implement this using simple python code:\n",
    "\n",
    ".. code:: python\n",
    "\n",
    "    learning_rate = 0.01\n",
    "    for f in net.parameters():\n",
    "        f.data.sub_(f.grad.data * learning_rate)\n",
    "\n",
    "However, as you use neural networks, you want to use various different\n",
    "update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.\n",
    "To enable this, we built a small package: ``torch.optim`` that\n",
    "implements all these methods. Using it is very simple:\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch.optim as optim\n",
    "\n",
    "# create your optimizer\n",
    "optimizer = optim.SGD(net.parameters(), lr=0.01)\n",
    "\n",
    "# in your training loop:\n",
    "optimizer.zero_grad()   # zero the gradient buffers\n",
    "output = net(input)\n",
    "loss = criterion(output, target)\n",
    "loss.backward()\n",
    "optimizer.step()    # Does the update"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ".. Note::\n",
    "\n",
    "      Observe how gradient buffers had to be manually set to zero using\n",
    "      ``optimizer.zero_grad()``. This is because gradients are accumulated\n",
    "      as explained in `Backprop`_ section.\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
